Skip to content

[JAVA-6159] Micrometer feedback #1940

Open
nhachicha wants to merge 9 commits intomongodb:mainfrom
nhachicha:nh/micrometer_spring_feedback
Open

[JAVA-6159] Micrometer feedback #1940
nhachicha wants to merge 9 commits intomongodb:mainfrom
nhachicha:nh/micrometer_spring_feedback

Conversation

@nhachicha
Copy link
Copy Markdown
Collaborator

@nhachicha nhachicha commented Apr 13, 2026

JAVA-6159

AI Usage Summary

Claude-Caude with Opus 4.6 1M Model

What AI did well

  • Research & synthesis — Fetched and synthesized feedback from 3 external sources (Jonatan's gist, Spring Boot issue, Spring Data ContextProviderFactory) into a structured plan
  • Test-first approach — Wrote failing tests before implementing fixes, following the user's guidance
  • Mechanical refactoring — Renaming imports, splitting enums, updating call sites across multiple files
  • Cross-referencing open source — Researched Lettuce, R2DBC, and Spring WebFlux patterns for reactive context propagation
  • Explaining concepts — Broke down Micrometer Observation vs Tracing API, scope mechanics, Reactor context propagation

Where the user corrected or pushed back

Situation What happened
Reducing tag duplication AI implemented a CommonLowCardinalityKeyNames refactor; user asked for suggestion only, not implementation — had to revert
Removing imperative tagging AI initially kept both imperative tags AND context fields (duplication); user spotted the redundancy and pushed to remove all imperative calls
tagHighCardinality in InternalStreamConnection AI claimed the remaining imperative calls were necessary; user pointed out they could go through getMongodbContext() instead
Scope in TransactionSpan AI put openScope() in the constructor (shared by sync+reactive); user asked why scope is public — led to discovering it should only be called from sync
Reactive context propagation AI spent multiple iterations trying contextWrite, contextCapture, AtomicReference patterns — all failed due to Mono.from(subscriber -> ...) pattern. User questioned
each attempt
Hooks.enableAutomaticContextPropagation AI assumed version alignment would fix it; user actually tried it and reported the hang, forcing deeper investigation
Reactive driver refactor scope After extensive async work, user asked if context-propagation=auto alone would suffice — AI confirmed, user decided to stash reactive changes and keep it simple
commit transaction reactive failure AI tried to skip the test; user rejected and demanded root cause investigation — led to finding the OTel bridge scope leak from TransactionSpan constructor
Modifying stopped observations AI removed parent cursor_id mutation; user showed Zipkin traces proving it broke nesting — led to reverting, then user suggested removing it properly since parenting
works via operationSpan.context()
Test validation gaps User commented out contextWrite and contextCapture to prove tests still passed — exposed that tests weren't actually validating the code they claimed to test
CheckStyle/SpotBugs AI fixed checkstyle issues; user asked to revert since they only wanted suggestions, not implementation

The driver used a single observation name ("mongodb") for both operation-level and command-level spans, which have different sets of low-cardinality tag keys. Prometheus requires all meters sharing a name to have identical tag key sets, causing the second observation type to be silently dropped.
Split MongodbObservation.MONGODB_OBSERVATION into MONGODB_OPERATION (name "mongodb.operation") and MONGODB_COMMAND (name "mongodb.command"), each declaring its own low-cardinality key set. Updated Tracer and TracingManager to pass the observation type through span creation.
@nhachicha nhachicha self-assigned this Apr 13, 2026
  Connection IDs, cursor IDs, session IDs, transaction numbers, and exception details were tagged as low-cardinality, causing unbounded
  Prometheus metric cardinality since their values change per-connection, per-cursor, or per-error. Moved CLIENT_CONNECTION_ID, SERVER_CONNECTION_ID, CURSOR_ID,TRANSACTION_NUMBER, SESSION_ID, EXCEPTION_MESSAGE, EXCEPTION_TYPE, and EXCEPTION_STACKTRACE from CommandLowCardinalityKeyNames to HighCardinalityKeyNames so they appear only in traces, not in metrics.
  Added tagHighCardinality(KeyValue) and tagHighCardinality(KeyValues) to the Span interface to support string-valued high-cardinality tags alongside the existing BsonDocument overload.
The query text max length configuration constant was stored in every Observation.Context and extracted back in the MicrometerSpan constructor. This value never changes between observations and is not output as any signal. Pass it directly via constructor parameter instead.
Observations were created with Micrometer's generic SenderContext, preventing users from filtering or customizing MongoDB observations
by context type. This blocks the ObservationConvention pattern that Spring Boot needs for tag alignment.
Introduced MongodbContext extending SenderContext<Object> with Kind.CLIENT, giving users a MongoDB-specific type
to register ObservationHandler<MongodbContext> or ObservationConvention<MongodbContext>
instances scoped to only MongoDB observations.
…f TracingManager

Replaced all imperative tagLowCardinality/tagHighCardinality calls with a convention-based approach. TracingManager and InternalStreamConnection now populate domain fields on MongodbContext, and DefaultMongodbObservationConvention reads those fields at stop time to produce the final key-values.

This decouples tag naming from span creation, enabling users to register
a GlobalObservationConvention<MongodbContext> to customize tag names for
their environment (e.g. Spring Boot tag alignment with their existing
DefaultMongoCommandTagsProvider).

Added domain fields to MongodbContext: observationType, commandName,
databaseName, collectionName, serverAddress, connectionId, cursorId,
transactionNumber, sessionId, queryText, responseStatusCode.

Removed tagLowCardinality/tagHighCardinality from the Span interface
as they are no longer used.
@nhachicha nhachicha force-pushed the nh/micrometer_spring_feedback branch from 9b4b0ad to bf91627 Compare April 14, 2026 09:48
Update attribute name for OpenTelemetry
The driver called observation.start()/stop() but never openScope()/
closeScope(). Without scopes, registry.getCurrentObservation() returned
null during MongoDB operations, breaking context propagation for any
downstream code (Spring interceptors, user observations, MDC logging).

For example, in withTransaction, a user observation created inside the
callback would attach to the Spring HTTP parent instead of the MongoDB
transaction span, because the transaction observation was never made "current" on the thread.
Added openScope()/closeScope() to the Span interface with scope lifecycle management in MongoClusterImpl (operation spans), InternalStreamConnection (command spans), and TransactionSpan.
When a getMore command arrived, the cursor_id was being set on the
parent operation span's MongodbContext even though the parent observation
was already stopped. Modifying an observation after stop() is undefined
behavior in Micrometer
@nhachicha nhachicha changed the title WIP Micrometer feedback [JAVA-6159] Micrometer feedback Apr 16, 2026
@nhachicha nhachicha marked this pull request as ready for review April 16, 2026 14:58
@nhachicha nhachicha requested a review from a team as a code owner April 16, 2026 14:58
@nhachicha nhachicha requested review from Copilot and rozza April 16, 2026 14:58
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Refactors Micrometer-based tracing to align better with OpenTelemetry/Micrometer conventions by separating operation vs command observations, moving tag production into an ObservationConvention, and introducing explicit span scope management in sync execution paths.

Changes:

  • Split MongoDB observations into distinct operation- and command-level observation types to avoid tag-keyset collisions (e.g., Prometheus restrictions).
  • Replace imperative tagging with a MongodbContext + DefaultMongodbObservationConvention that derives tags from populated domain fields.
  • Add explicit openScope() / closeScope() lifecycle handling for spans in sync execution code paths and update unified test modifications accordingly.

Reviewed changes

Copilot reviewed 15 out of 15 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
driver-sync/src/test/functional/com/mongodb/client/unified/UnifiedTestModifications.java Updates unified test skip rules for OpenTelemetry/Micrometer-related specs.
driver-sync/src/test/functional/com/mongodb/client/unified/Entities.java Minor formatting adjustment in Micrometer observability settings setup.
driver-sync/src/test/functional/com/mongodb/client/observability/SpanTree.java Updates test tag-key imports to match new low/high-cardinality key enums.
driver-sync/src/main/com/mongodb/client/internal/MongoClusterImpl.java Opens/closes span scope around sync operation execution.
driver-sync/src/main/com/mongodb/client/internal/ClientSessionImpl.java Opens transaction-span scope when starting a transaction (sync).
driver-core/src/main/com/mongodb/internal/observability/micrometer/TransactionSpan.java Ensures transaction span scope is closed before ending in finalization paths; adds openScope() API.
driver-core/src/main/com/mongodb/internal/observability/micrometer/TracingManager.java Switches span creation to operation vs command observation types and populates MongodbContext fields for conventions.
driver-core/src/main/com/mongodb/internal/observability/micrometer/Tracer.java Updates tracer API to accept an observation type when creating spans.
driver-core/src/main/com/mongodb/internal/observability/micrometer/Span.java Replaces tag APIs with scope management + query-text setting + context access.
driver-core/src/main/com/mongodb/internal/observability/micrometer/MongodbObservation.java Splits key names into operation vs command low-cardinality sets and reorganizes high-cardinality keys.
driver-core/src/main/com/mongodb/internal/observability/micrometer/MongodbContext.java Introduces a MongoDB-specific Micrometer SenderContext to hold domain fields used by conventions.
driver-core/src/main/com/mongodb/internal/observability/micrometer/MicrometerTracer.java Uses MongodbContext, registers the default convention, and implements new Span API.
driver-core/src/main/com/mongodb/internal/observability/micrometer/DefaultMongodbObservationConvention.java New default global convention that emits tags from MongodbContext fields (including errors).
driver-core/src/main/com/mongodb/internal/connection/InternalStreamConnection.java Uses new Span API, populates query text/status code via MongodbContext, and opens/closes scope around command spans.
config/checkstyle/suppressions.xml Moves the printStackTrace suppression to the new convention class.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 427 to 433
Span span = operationContext.getTracingManager().createOperationSpan(
actualClientSession.getTransactionSpan(), operationContext, operation.getCommandName(), operation.getNamespace());
if (span != null) {
span.openScope();
}
ReadBinding binding = getReadBinding(readPreference, actualClientSession, implicitSession);

Copy link

Copilot AI Apr 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

span.openScope() is called before getReadBinding(...). If getReadBinding throws (e.g., cluster/binding construction failure), the scope will remain open because the finally block is never entered. Wrap scope open/close in an outer try/finally (or move openScope() inside the existing try) so closeScope() is guaranteed to run even when binding creation fails.

Copilot uses AI. Check for mistakes.
Comment on lines 466 to 472
Span span = operationContext.getTracingManager().createOperationSpan(
actualClientSession.getTransactionSpan(), operationContext, operation.getCommandName(), operation.getNamespace());
if (span != null) {
span.openScope();
}
WriteBinding binding = getWriteBinding(actualClientSession, isImplicitSession(session));

Copy link

Copilot AI Apr 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

span.openScope() is called before getWriteBinding(...). If binding creation throws, the scope is leaked because closeScope() is only invoked in the finally that runs after operation.execute(...). Restructure to ensure closeScope() executes for all paths after openScope(), including failures during binding acquisition.

Copilot uses AI. Check for mistakes.
Comment on lines 457 to 480
@@ -473,14 +476,16 @@ private <T> T sendAndReceiveInternal(final CommandMessage message, final Decoder
commandEventSender = new NoOpCommandEventSender();
}
if (isTracingCommandPayloadNeeded) {
tracingSpan.tagHighCardinality(QUERY_TEXT.asString(), commandDocument);
tracingSpan.setQueryText(commandDocument);
}
Copy link

Copilot AI Apr 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tracingSpan.openScope() happens before command document hydration and before the sendCommandMessage try/catch. If any exception is thrown between openScope() and the later close sites (e.g., message.getCommandDocument(bsonOutput), event sender construction), the scope will leak. Enclose the entire post-openScope() section in a try/finally that closes the scope (and ends the span as appropriate) on all exceptional paths.

Copilot uses AI. Check for mistakes.
Comment on lines +63 to +66
// Register default convention. Users can override by registering their own GlobalObservationConvention
// after the MongoClient is created — the last matching convention wins.
DefaultMongodbObservationConvention defaultConvention = new DefaultMongodbObservationConvention();
observationRegistry.observationConfig().observationConvention(defaultConvention);
Copy link

Copilot AI Apr 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Registering DefaultMongodbObservationConvention in the MicrometerTracer constructor mutates the user-provided ObservationRegistry and will add another convention each time a MongoClient/MicrometerTracer is created with the same registry. This can lead to unbounded growth and can unintentionally override user conventions depending on Micrometer's selection order. Consider making registration idempotent (detect existing convention), or registering the convention once at a higher level rather than per-tracer instance.

Copilot uses AI. Check for mistakes.
Comment on lines +207 to +210
// Populate domain fields on MongodbContext — the convention reads these to produce tags
MongodbContext mongodbContext = span.getMongodbContext();
if (mongodbContext != null) {
mongodbContext.setCommandName(commandName);
Copy link

Copilot AI Apr 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method now relies on populating MongodbContext (and a convention at observation stop time) rather than directly attaching all of the tags described in the method Javadoc above. Please update the Javadoc accordingly so it reflects the new tagging mechanism and doesn’t reference tags that are no longer set here.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants